Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Restrict pydantic 2.10.0 #44249

Merged
merged 2 commits into from
Nov 22, 2024
Merged

Conversation

gopidesupavan
Copy link
Member

Recent pydantic 2.10.0 release causing failures in CI.
It seems there is ticket already opened here. pydantic/pydantic#10910

Failing tests in CI https://github.com/apache/airflow/actions/runs/11954326679/job/33324739597#step:7:12365


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@gopidesupavan gopidesupavan added full tests needed We need to run full set of tests for this PR to merge all versions If set, the CI build will be forced to use all versions of Python/K8S/DBs labels Nov 21, 2024
@sydney-runkle
Copy link

We should have a patch release out this afternoon with a fix for the issues you're encountering :)

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if we need to merge this or not given a 2.10.1 will be out soon

@ashb
Copy link
Member

ashb commented Nov 21, 2024

@sydney-runkle Do you plan on yanking the 2.10.0 release? (I don't mind either way, just need to know what we do)

If 2.10.0 gets yanked then we don't need to keep this/can revert it, otherwise we probably should keep the exclusion rule

hatch_build.py Outdated Show resolved Hide resolved
hatch_build.py Outdated Show resolved Hide resolved
Co-authored-by: Jarek Potiuk <jarek@potiuk.com>
@sydney-runkle
Copy link

We're not going to yank v2.10.0, will just release a v2.10.1 patch soon!

@sydney-runkle
Copy link

Perhaps you all could test against our main branch before we do our patch release to confirm that all is working as expected on your end?

@potiuk
Copy link
Member

potiuk commented Nov 21, 2024

Perhaps you all could test against our main branch before we do our patch release to confirm that all is working as expected on your end?

It would be easier if you have an rc or beta/alpha released in PyPI - because I am not sure if we can trigger the whole test suite against Github-installed version. Can it be done @sydney-runkle ?

@sydney-runkle
Copy link

sydney-runkle commented Nov 21, 2024

You can try with pydantic==git+https://github.com/pydantic/pydantic@main

@potiuk
Copy link
Member

potiuk commented Nov 21, 2024

You can try with pydantic==git+https://github.com/pydantic/pydantic@main

It might be that we expect version in our CI for that, so I am not sure if it's going to work . I can try, but I have a feeling it will fail in one of the steps.

@gopidesupavan
Copy link
Member Author

You can try with pydantic==git+https://github.com/pydantic/pydantic@main

It might be that we expect version in our CI for that, so I am not sure if it's going to work . I can try, but I have a feeling it will fail in one of the steps.

yeah i think it might fail at constraints ?

@potiuk
Copy link
Member

potiuk commented Nov 21, 2024

You can try with pydantic==git+https://github.com/pydantic/pydantic@main

It might be that we expect version in our CI for that, so I am not sure if it's going to work . I can try, but I have a feeling it will fail in one of the steps.

Trying it here @sydney-runkle #44260

@potiuk
Copy link
Member

potiuk commented Nov 21, 2024

You can try with pydantic==git+https://github.com/pydantic/pydantic@main

It might be that we expect version in our CI for that, so I am not sure if it's going to work . I can try, but I have a feeling it will fail in one of the steps.

Trying it here @sydney-runkle #44260

Yeah... some tests are already failing here as I suspected. But those are auxiliary ones, maybe the main part of the tests that actually failed before will be ok:

  Preparing metadata (pyproject.toml): finished with status 'done'
ERROR: Requested apache-airflow==3.0.0.dev0 from file:///home/runner/work/airflow/airflow has invalid metadata: Expected end or semicolon (after name and no valid version specifier)
    pydantic==git+https://github.com/pydantic/pydantic@main

@Viicos
Copy link

Viicos commented Nov 21, 2024

We're going to release 2.10.1 any minute now.

We got a report today of an issue happening on Python <3.10. The TaskInstancePydantic class is subclassing LoggingMixin. This class has an annotation for _log using the new 3.10+ union syntax. While Pydantic has always evaluated all type annotations of models including their bases, we were not using the correct globals and locals to so. Meaning with the following example:

a.py

from logging import Logger

class Base:
    _log: 'Logger | None'

b.py

from a import Base

from pydantic import BaseModel

class Model(BaseModel, Base):
    pass

Evalutating the annotation for _log would have resulted in a NameError, because Logger is imported in a.py, and we only used the globals of the b.py module until now. NameError's are ignored to allow for types to be defined later on. However, on 2.10, this now raises a TypeError because 'Logger | None' successfully evaluates (read: Logger can be resolved), but the union syntax is not supported on Python 3.9.

What's probably best is to make sure you're using the old syntax for every class that is going to be used in the context of a Pydantic model.

@potiuk potiuk merged commit fc52d7d into apache:main Nov 22, 2024
132 of 134 checks passed
@sydney-runkle
Copy link

Just released v2.10.1 with fixes for the issues you folks reported :)

@potiuk
Copy link
Member

potiuk commented Nov 22, 2024

What's probably best is to make sure you're using the old syntax for every class that is going to be used in the context of a Pydantic model.

As discussed in pydantic/pydantic#10924 (comment) - I think this change is breaking far too much of an existing code base and you are facing the reality that this will cause even more people to limit their Pydantic version to < 2 (it looks like the dbt team asked dbt-databricks maintainer to do so). While we might likely implement some fix to that (maybe limiting Pydantic to <2.10) in Airflow 2.10.4, this is not a good solution because Pydantic is so popular and used in many, many dependencies, and it's simply not reasonable to expect that existing released code will be updated. It might help if new versions of software are updated but I have a feeling even recent version of released software will start failing with similar issues. So I would seriously reconsider that "fix" and revert it or workaround it in 2.10.2 (at least that's what I'd do as a maintainer of such popular library).

@Viicos
Copy link

Viicos commented Nov 22, 2024

I think this change is breaking far too much of an existing code base

Since the release of 2.10, we only had one report (pydantic/pydantic#10924) related to evaluation of forward annotations.

Pydantic is so popular and used in many, many dependencies

We are aware of that, it is, in some cases, extremely challenging for us to make changes/cleanups, especially because in the field of forward annotations evaluation.

The standard library utilities are far from perfect, so we had to implement our own logic to support edge cases that unfortunately led to a messy implementation, accepting invalid annotations to evaluate (this can be dangerous as names in forward annotations could resolve to the wrong type) 1.

it's simply not reasonable to expect that existing released code will be updated

Honestly, this requires a one line change on your end:

class LoggingMixin:
"""Convenience super-class to have a logger configured with the class name."""
_log: logging.Logger | None = None
# Parent logger used by this class. It should match one of the loggers defined in the
# `logging_config_class`. By default, this attribute is used to create the final name of the logger, and
# will prefix the `_logger_name` with a separating dot.
_log_config_logger_name: Optional[str] = None # noqa: UP007
_logger_name: Optional[str] = None # noqa: UP007

Your usage of type annotations is inconsistent here, as you are mixing the new union syntax with Optional. Note that if this class were to be defined in the same module as TaskInstancePydantic, the issue would have appeared already.

So I would seriously reconsider that "fix" and revert it or workaround it in 2.10.2

The "fix" is not actually a fix, but a complete refactor of our forward annotations evaluation. It's unfortunately not possible for us to "revert" it. The current implementation is vastly more accurate, and as I said above will avoid dangerous issues when forward annotations are silently not resolved to the correct type.

If we do get more reports, depending on their validity, we will then consider tweaking the current logic.

Footnotes

  1. Read more here.

@Viicos
Copy link

Viicos commented Nov 22, 2024

For instance: langchain-ai/langchain-google#610 is an example of a forward annotation evaluation that resolved to the wrong type in <2.10 (I'm in the process of writing up what's happening).

@pierrejeambrun
Copy link
Member

pierrejeambrun commented Nov 22, 2024

What's probably best is to make sure you're using the old syntax for every class that is going to be used in the context of a Pydantic model.

Indeed this is not great. We have all the codebase updated to the new style type annotation. (|, list, etc...) and having to use the old one everywhere pydantic is involved is not great. (maybe I didn't understand correctly what you are suggesting).

but the union syntax is not supported on Python 3.9.

We use from future import __annotation__ in conjunction with eval-type-backport and has been working nicely for us to solve pydantic new style annotation + forward evaluation in python 3.9. Is that expected or will we still face an issue with the new release ?

@potiuk
Copy link
Member

potiuk commented Nov 22, 2024

If we do get more reports, depending on their validity, we will then consider tweaking the current logic.

Sure. It is indeed up to you. We are pretty well protected - we have "constraint mechanism" and we can restric airlfow 2.10.4 - and we are rather happy to adjust future versions of airflow, I was just merely suggesting that this change might have more consequences for you - so It was more of a friendly advice than complaint.

But yes - I agree my assesment might be wrong.

Your usage of type annotations is inconsistent here, as you are mixing the new union syntax with Optional. Note that if this class were to be defined in the same module as TaskInstancePydantic, the issue would have appeared already.

Yes - but to be perfectly honest it's because of Pydantic. We have a rule in our pre-commit that enforces using new style type annotations. Because of inability of pydantic to handle it, we specifically converted our Pydantic classes to not use from future import __annotations__ and we even had to add exeption to our rule:

https://github.com/apache/airflow/blob/main/pyproject.toml#L326

So our "inconistent use" is exclusively because of Pydantic. And our use is perfectly fine according to current state of type annotations - both style of annotations can be inter-mixed and from __future__ import annoutattion as defined in https://peps.python.org/pep-0563/ . And while I know there are a lot of controversies around that PEP and the way how it works will likely change in the future, this is the current state of affairs, and we are following the accepted PEPs (while Pydantic does not really handle them properly and we had to introduce inconsistency in order to workaround it).

So technically speaking it's because of Pydantic we are inconsistent.

And I am not complaining and not trying to heat it up, I am just stating the fact - explaining that well, it's Pydantic who is the root cause of that inconsistency, not Airflow.

So really - @Viicos - please treat it as a friendly advice, that maybe it's a good idea to consider supporting that case. While we will handle it ourselves (luckily our constraint mechanism we introduce many years ago and educate our users to use them will prevent a lot of our users from being affected). It's really a maintainer-to-maintainer friendly suggestion that this might hit you back.

@Viicos
Copy link

Viicos commented Nov 22, 2024

Indeed this is not great. We have all the codebase updated to the new style type annotation. (|, list, etc...) and having to use the old one everywhere pydantic is involved is not great. (maybe I didn't understand correctly what you are suggesting).

Yes sorry what I meant is precisely on the LoggingMixin class, where two annotations are using the old syntax, while the _log field is using the new one.

But as you mentioned, nothing related to this should be an issue if the eval-type-backport backport is used. What's interesting however, is that the initial report should not happen if the backport is indeed installed.

Turns out, the backport was added in 7d5f2ba, and the user is still on 2.9.0, released prior to the addition of the dependency. Sorry for the confusion.


so It was more of a friendly advice than complaint.

Sorry if I did not understand it that way. I think we both know how annoying unexpected breaking changes can be. In this case, the intent was to fix actual invalid cases (such as this one) and I would find it very unfortunate to consider Pydantic in the same state as other well established libraries (e.g. requests) where no change is possible because of https://xkcd.com/1172/.

Regarding the "inconsistent use" we are talking about, as I answered at the top of this comment, it was in fact due to the reporting user not using the latest airfow version which included the evaluation backport.

while Pydantic does not really handle them properly

Pydantic is not really to blame here. There are some PEP 563 use cases that are theoretically impossible to support (e.g. referencing locals in a function) (and we had/will have to wait for PEP 649 to resolve most of these cases). And compared to the other dataclass-like libraries performing validation, I'm not afraid to say Pydantic is by far the most accurate when it comes to handling forward annotations, including the eval-type-backport library (which solves most of the discussion here 😄).

Thanks again, I'll close the original issue by suggesting upgrading to the latest airflow version. Please let us know if you encounter more issues with 2.10.1.

@Viicos
Copy link

Viicos commented Nov 22, 2024

Sidenote:

Prior to the final 2.10 release, I was looking at libraries making extensive use of Pydantic. I tested on some, but wasn't aware airflow was using it, along with vertexai/langchain-google etc. We will definitely try to test future beta releases on these ones.

@pierrejeambrun
Copy link
Member

Prior to the final 2.10 release, I was looking at libraries making extensive use of Pydantic. I tested on some, but wasn't aware airflow was using it, along with vertexai/langchain-google etc. We will definitely try to test future beta releases on these ones.

Great to hear! Yes the usage in airflow started small but it's becoming more and more central in our datamodel.

But as you mentioned, nothing related to this should be an issue if the eval-type-backport backport is used. What's interesting however, is that the pydantic/pydantic#10924 should not happen if the backport is indeed installed.

Thanks for the details 👍

@potiuk
Copy link
Member

potiuk commented Nov 22, 2024

Yes. thanks for all details. Again, it was not blaming or complaint (and I know how difficult it is to pass emotions and intents via GitHub issues.

And yes - i know, and quote https://xkcd.com/1172/ more often than not. Also what gets handy in such case is https://www.hyrumslaw.com/ (which BTW also has the reference to the XKCD in question) :) .

Now all is clear. I've also learned about eval-type-backport now - it's been part of the much bigger https://github.com/apache/airflow/pull/42196/files#diff-d56d1b58c0ca28bdddd10c2cee2ab3cb4312766ea5de51f210dbc15a0fea42faR427 - and I even did not know that it existed.

FYI @Viicos - that one was added only in main and is going to be added as a requirement in Airflow 3.0. But as far as I understand, just installing it should generally solve the problem that the use had with databricks-dbt. So @pierrejeambrun -> maybe this is the right fix that we should add to 2.10.4 - simply add eval-type-backport to 2.10.4 to preven similar errors for users who install newer Pydantic versions ?

@potiuk
Copy link
Member

potiuk commented Nov 22, 2024

@pierrejeambrun Created #44294 -> I noticed that the eval-type-backport has been added for all python versions (and we really need it for < 3.10) - so I corrected it in #44294 - and we can backport it to 2.10.4 becuse #42196 was not cherry-pickable (learning for the future those kind of changes should be raher split from the main change - seems that eval-type-backport dependency deserved it's own PR.

@pierrejeambrun
Copy link
Member

Sounds great thanks Jarek.

Yes indeed when I first introduced it, I really didn’t know it would be all that important and have to be cherry picked.

It was really just to fix issues I was facing with initial development of the FastAPI API. Now looking back, indeed a PR of its own would have been nice 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
all versions If set, the CI build will be forced to use all versions of Python/K8S/DBs full tests needed We need to run full set of tests for this PR to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants